{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pyhton数据结构和算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python提供了大量的内置数据结构，包括列表，集合以及字典。大多数情况下使用这些数据结构是很简单的。但是，我们也会经常碰到到诸如查询，排序和过滤等等这些普遍存在的问题。因此，这一章的目的就是讨论这些比较常见的问题和算法。另外，我们也会给出在集合模块`collections`当中操作这些数据结构的方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 解压序列赋值给多个变量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "任何的序列(或者是可迭代对象)可以通过一个简单的赋值语句解压并赋值给多个变量。唯一的前提就是变量的数量必须跟序列元素的数量是一样的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p = (4, 5)\n",
    "x, y = p\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ABCD'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = ['ABCD', 50, 91.1, (2013, 12, 21)]\n",
    "name, shares, price, data = data\n",
    "name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2013, 12, 21)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ABCD'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = ['ABCD', 50, 91.1, (2013, 12, 21)]\n",
    "name, shares, price, (year, mon, day) = data\n",
    "name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2013"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "12"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mon"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "21"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "day"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果变量个数和序列元素的个数不匹配，会产生一个异常。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "not enough values to unpack (expected 3, got 2)",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-9-3d0a414c5e7f>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0mp\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m5\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mx\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mx\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mp\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m: not enough values to unpack (expected 3, got 2)"
     ]
    }
   ],
   "source": [
    "p = (4, 5)\n",
    "x, y, x = p"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实际上，这种解压赋值可以用在任何可迭代对象上面，而不仅仅是列表或者元组。包括字符串，文件对象，迭代器和生成器。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'h'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = 'hello'\n",
    "a, b, c, d, e = s\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'l'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'l'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有时候，可能只想解压一部分，丢弃其他的值。对于这种情况Python并没有提供特殊的语法。但是可以使用任意变量名去占位，到时候丢掉这些变量就行了。\n",
    "\n",
    "必须保证你选用的那些占位变量名在其他地方没被使用到。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = ['name', 50, 91.1, (2012, 12, 21)]\n",
    "_, shares, price, _ = data\n",
    "shares"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "91.1"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "price"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 解压可迭代对象赋值给多个变量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "排除掉第一个和最后一个分数，取中间数的平均值。Python的星号表达式可以用来解决这个问题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def drop_first_last(grades):\n",
    "    first, *middle, last = grades\n",
    "    return avg(middle)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从一个列表中获取不确定数量的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'dave'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "record = ('dave', 'dave@example.com', '773-555-1212', '846-123-4562')\n",
    "name, email, *phone_numbers = record\n",
    "name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'dave@example.com'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "email"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['773-555-1212', '846-123-4562']"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "phone_numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "值得注意的是上面解压出的`phone_numbers`变量永远都是列表类型，不管解压的电话号码数量是多少(包括0个)。所以，任何使用到`phone_numbers`变量的代码就不需要做多余的类型检查去确认它是否是列表类型了。\n",
    "\n",
    "星号表达式也能用在列表的开始部分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "*trailing, current = [10, 8, 7, 1, 9, 5, 10, 3, 2, 4, 6]\n",
    "current"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[10, 8, 7, 1, 9, 5, 10, 3, 2, 4]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trailing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "星号解压语法在字符串操作的时候也会很有用，比如字符串的分割。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'nobody'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'\n",
    "uname, *fields, homedir, sh = line.split(':')\n",
    "uname"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'/var/empty'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "homedir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'/usr/bin/false'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有时候，想解压一些元素后丢弃它们，不能简单就使用`*`，但是可以使用一个普通的废弃名称，比如`_`或者`ign`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'name'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "record = ('name', 50, 123.45, (12, 28, 2013))\n",
    "name, *_, (*_, year) = record\n",
    "name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2013"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将一个列表分割成前后两部分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "items = [1, 10, 7, 4, 5, 9]\n",
    "head, *tail = items\n",
    "head"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[10, 7, 4, 5, 9]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用这种分割语法去巧妙的实现递归算法。\n",
    "\n",
    "需要注意的是，由于语言层面的限制，递归并不是Python擅长的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "36"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def sum(items):\n",
    "    head, *tail = items\n",
    "    return head + sum(tail) if tail else head\n",
    "\n",
    "sum(items)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 保留最后N个元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用`deque(maxlen=N)`构造函数会新建一个固定大小的队列。当新的元素加入并且这个队列已满的时候，最老的元素会自动被移除掉。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([1, 2, 3])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import deque\n",
    "q = deque(maxlen = 3)\n",
    "q.append(1)\n",
    "q.append(2)\n",
    "q.append(3)\n",
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([2, 3, 4])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.append(4)\n",
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([3, 4, 5])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.append(5)\n",
    "q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果不设置最大队列大小，那么就会得到一个无限大小队列，可以在队列的两端执行添加和弹出元素的操作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([1, 2, 3])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q = deque()\n",
    "q.append(1)\n",
    "q.append(2)\n",
    "q.append(3)\n",
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([4, 1, 2, 3])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.appendleft(4)\n",
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.pop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "deque([4, 1, 2])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.popleft()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在队列两端插入或删除元素时间复杂度都是`O(1)`，而在列表的开头插入或删除元素的时间复杂度为`O(N)`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 查找最大或最小的N个元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`heapq`模块有两个函数：`nlargest()`和`nsmallest()`可以完美解决这个问题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[42, 37, 23]\n"
     ]
    }
   ],
   "source": [
    "import heapq\n",
    "nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]\n",
    "print(heapq.nlargest(3, nums))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-4, 1, 2]\n"
     ]
    }
   ],
   "source": [
    "print(heapq.nsmallest(3, nums))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接受一个关键字参数，用于更复杂的数据结构中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'name': 'YHOO', 'price': 16.35, 'shares': 45},\n",
       " {'name': 'FB', 'price': 21.09, 'shares': 200},\n",
       " {'name': 'HPQ', 'price': 31.75, 'shares': 35}]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "portfolio = [\n",
    "    {'name': 'IBM', 'shares': 100, 'price': 91.1},\n",
    "    {'name': 'AAPL', 'shares': 50, 'price': 543.22},\n",
    "    {'name': 'FB', 'shares': 200, 'price': 21.09},\n",
    "    {'name': 'HPQ', 'shares': 35, 'price': 31.75},\n",
    "    {'name': 'YHOO', 'shares': 45, 'price': 16.35},\n",
    "    {'name': 'ACME', 'shares': 75, 'price': 115.65}\n",
    "]\n",
    "\n",
    "cheap = heapq.nsmallest(3, portfolio, key = lambda s: s['price'])\n",
    "expensive = heapq.nlargest(3, portfolio, key = lambda s: s['price'])\n",
    "\n",
    "cheap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'name': 'AAPL', 'price': 543.22, 'shares': 50},\n",
       " {'name': 'ACME', 'price': 115.65, 'shares': 75},\n",
       " {'name': 'IBM', 'price': 91.1, 'shares': 100}]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "expensive"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在一个集合中查找最小或最大的N个元素，并且N小于集合元素数量，那么这些函数提供了很好的性能。因为在底层实现里面，首先会先将集合数据进行堆排序后放入一个列表中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import heapq\n",
    "\n",
    "nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]\n",
    "heapq.heapify(nums)\n",
    "nums"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "堆数据结构最重要的特征是`heap[0]`永远是最小的元素。并且剩余的元素可以很容易的通过调用`heapq.heappop()`方法得到，该方法会先将第一个元素弹出来，然后用下一个最小的元素来取代被弹出元素(时间复杂度仅仅是`O(N)`，N是堆大小)。\n",
    "\n",
    "想要查找最小的3个元素，你可以这样做："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-4"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "heapq.heappop(nums)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "heapq.heappop(nums)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "heapq.heappop(nums)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当要查找的元素个数相对比较小的时候，函数`nlargest()`和`nsmallest()`是最合适的。\n",
    "\n",
    "如果仅仅想查找唯一的最小或最大(N=1)的元素的话，那么使用`min()`和`max()`函数会更快些。\n",
    "\n",
    "如果N的大小和集合大小接近的时候，通常先排序这个集合然后再使用切片操作会更快点(`sorted(items)[:N]`或者是`sorted(items)[-N:]`)。\n",
    "\n",
    "需要在正确场合使用函数`nlargest()`和`nsmallest()`才能发挥它们的优势(如果N快接近集合大小了，那么使用排序操作会更好些)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 实现一个优先级队列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "利用`heapq`模块实现了一个简单的优先级队列："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "import heapq\n",
    "class PriorityQueue:\n",
    "    def __init__(self):\n",
    "        self._queue = []\n",
    "        self._index = 0\n",
    "        \n",
    "    def push(self, item, priority):\n",
    "        heapq.heappush(self._queue, (-priority, self._index, item))\n",
    "        self._index += 1\n",
    "        \n",
    "    def pop(self):\n",
    "        return heapq.heappop(self._queue)[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用方式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Item('bar')"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class Item:\n",
    "    def __init__(self, name):\n",
    "        self.name = name\n",
    "        \n",
    "    def __repr__(self):\n",
    "        return 'Item({!r})'.format(self.name)\n",
    "    \n",
    "q = PriorityQueue()\n",
    "q.push(Item('foo'), 1)\n",
    "q.push(Item('bar'), 5)\n",
    "q.push(Item('spam'), 4)\n",
    "q.push(Item('grok'), 1)\n",
    "q.pop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Item('spam')"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.pop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Item('foo')"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.pop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Item('grok')"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.pop()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一个`pop()`操作返回优先级最高的元素。如果两个有着相同优先级的元素(`foo`和`grok`)，pop操作按照它们被插入到队列的顺序返回的。\n",
    "\n",
    "函数`heapq.heappush()`和`heapq.heappop()`分别在队列`_queue`上插入和删除第一个元素，并且队列_queue保证第一个元素拥有最小优先级\n",
    "\n",
    "`heappop()`函数总是返回“最小的”的元素，这就是保证队列pop操作返回正确元素的关键。\n",
    "\n",
    "由于push和pop操作时间复杂度为O(N)，其中N是堆的大小，因此就算是N很大的时候它们运行速度也依旧很快。\n",
    "\n",
    "队列包含了一个`(-priority,index,item)`的元组。优先级为负数的目的是使得元素按照优先级从高到低排序。\n",
    "\n",
    "`index`变量的作用是保证同等优先级元素的正确排序。通过保存一个不断增加的`index`下标变量，可以确保元素安装它们插入的顺序排序。而且，`index`变量也在相同优先级元素比较的时候起到重要作用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'<' not supported between instances of 'Item' and 'Item'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-50-222f9c0f0d57>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0ma\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mItem\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'foo'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[0mb\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mItem\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'bar'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0ma\u001b[0m \u001b[1;33m<\u001b[0m \u001b[0mb\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m: '<' not supported between instances of 'Item' and 'Item'"
     ]
    }
   ],
   "source": [
    "a = Item('foo')\n",
    "b = Item('bar')\n",
    "a < b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先假定`Item`实例是不支持排序的。如果你使用元组`(priority,item)`，只要两个元素的优先级不同就能比较。但是如果两个元素优先级一样的话，那么比较操作就会跟之前一样出错。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = (1, Item('foo'))\n",
    "b = (5, Item('bar'))\n",
    "a < b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'<' not supported between instances of 'Item' and 'Item'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-52-6faed17700c8>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0mc\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mItem\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'grok'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0ma\u001b[0m \u001b[1;33m<\u001b[0m \u001b[0mc\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m: '<' not supported between instances of 'Item' and 'Item'"
     ]
    }
   ],
   "source": [
    "c = (1, Item('grok'))\n",
    "a < c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过引入另外的`index`变量组成三元组`(priority,index,item)`，就能很好的避免上面的错误，因为不可能有两个元素有相同的`index`值。\n",
    "\n",
    "Python在做元组比较时候，如果前面的比较以及可以确定结果了，后面的比较操作就不会发生了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = (1, 0, Item('foo')) \n",
    "b = (5, 1, Item('bar'))\n",
    "c = (1, 2, Item('grok')) \n",
    "a < b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a < c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 字典中的键映射多个值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个字典就是一个键对应一个单值的映射。如果你想要一个键映射多个值，那么你就需要将这多个值放到另外的容器中，比如列表或者集合里面。\n",
    "\n",
    "选择使用列表还是集合取决于你的实际需求。如果想保持元素的插入顺序就应该使用列表，如果想去掉重复元素就使用集合（并且不关心元素的顺序问题）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {\n",
    "    'a': [1, 2, 3],\n",
    "    'b': [4, 5]\n",
    "}\n",
    "e = {\n",
    "    'a': {1, 2, 3},\n",
    "    'b': {4, 5}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以使用`collections`模块中的`defaultdict`来构造这样的字典。`defaultdict`的一个特征是它会自动初始化每个`key`刚开始对应的值，所以你只需要关注添加元素操作了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(list, {'a': [1, 2], 'b': [4]})"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "d = defaultdict(list)\n",
    "d['a'].append(1)\n",
    "d['a'].append(2)\n",
    "d['b'].append(4)\n",
    "d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defaultdict(set, {'a': {1, 2}, 'b': {4}})"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = defaultdict(set)\n",
    "d['a'].add(1)\n",
    "d['a'].add(2)\n",
    "d['b'].add(4)\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`defaultdict`会自动为将要访问的键(就算目前字典中并不存在这样的键)创建映射实体。如果你并不需要这样的特性，你可以在一个普通的字典上使用`setdefault()`方法来代替。\n",
    "\n",
    "`setdefault()`用起来有点别扭，每次调用都得创建一个新的初始值的实例(例子程序中的空列表`[]`)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a': [1, 2], 'b': [4]}"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = {}\n",
    "d.setdefault('a', []).append(1)\n",
    "d.setdefault('a', []).append(2)\n",
    "d.setdefault('b', []).append(4)\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "创建一个多值映射字典。对于值的初始化可能会有点麻烦，你可能会像下面这样来实现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {}\n",
    "for key, value in pairs:\n",
    "    if key not in d:\n",
    "        d[key] = []\n",
    "    d[key].append(value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果使用`defaultdict`的话代码就更加简洁了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = defaultdict(list)\n",
    "for key, value in pairs:\n",
    "    d[key].append(value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.7 字典排序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "创建一个字典，并且在迭代或序列化这个字典的时候能够控制元素的顺序，可以使用`collections`模块中的`OrderedDict`类。在迭代操作的时候它会保持元素被插入时的顺序。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "foo 1\n",
      "bar 2\n",
      "spam 3\n",
      "grok 4\n"
     ]
    }
   ],
   "source": [
    "from collections import OrderedDict\n",
    "\n",
    "def orderd_dict():\n",
    "    d = OrderedDict()\n",
    "    d['foo'] = 1\n",
    "    d['bar'] = 2\n",
    "    d['spam'] = 3\n",
    "    d['grok'] = 4\n",
    "    \n",
    "    for key in d:\n",
    "        print(key, d[key])\n",
    "        \n",
    "d = orderd_dict()\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当你想要构建一个将来需要序列化或编码成其他格式的映射的时候，`OrderedDict`是非常有用的。比如，你想精确控制以JSON编码后字段的顺序，你可以先使用`OrderedDict`来构建这样的数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'{\"foo\": 1, \"bar\": 2, \"spam\": 3, \"grok\": 4}'"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "def orderd_dict():\n",
    "    d = OrderedDict()\n",
    "    d['foo'] = 1\n",
    "    d['bar'] = 2\n",
    "    d['spam'] = 3\n",
    "    d['grok'] = 4\n",
    "    return d\n",
    "\n",
    "json.dumps(orderd_dict())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`OrderedDict`内部维护着一个根据键插入顺序排序的双向链表。每次当一个新的元素插入进来的时候，它会被放到链表的尾部。对于一个已经存在的键的重复赋值不会改变键的顺序。\n",
    "\n",
    "需要注意的是，一个`OrderedDict`的大小是一个普通字典的两倍，因为它内部维护着另外一个链表。所以如果你要构建一个需要大量`OrderedDict`实例的数据结构的时候(比如读取100,000行CSV数据到一个OrderedDict列表中去)，那么你就得仔细权衡一下是否使用OrderedDict带来的好处要大过额外内存消耗的影响。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.8 字典的运算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了对字典值执行计算操作，通常需要使用`zip()`函数先将键和值反转过来。比如，下面是查找最小和最大股票价格和股票值的代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10.75, 'FB')"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prices = {\n",
    "    'ACME': 45.23,\n",
    "    'AAPL': 612.78,\n",
    "    'IBM': 205.55,\n",
    "    'HPQ': 37.20,\n",
    "    'FB': 10.75\n",
    "}\n",
    "min_price = min(zip(prices.values(), prices.keys()))\n",
    "max_price = max(zip(prices.values(), prices.keys()))\n",
    "\n",
    "min_price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(612.78, 'AAPL')"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max_price"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "类似的，可以使用`zip()`和`sorted()`函数来排列字典数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(10.75, 'FB'),\n",
       " (37.2, 'HPQ'),\n",
       " (45.23, 'ACME'),\n",
       " (205.55, 'IBM'),\n",
       " (612.78, 'AAPL')]"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prices_sorted = sorted(zip(prices.values(), prices.keys()))\n",
    "prices_sorted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要注意的是`zip()`函数创建的是一个只能访问一次的迭代器。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(10.75, 'FB')\n"
     ]
    }
   ],
   "source": [
    "prices_and_names = zip(prices.values(), prices.keys())\n",
    "print(min(prices_and_names))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "max() arg is an empty sequence",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-67-a195e5a69c77>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmax\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mprices_and_names\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m: max() arg is an empty sequence"
     ]
    }
   ],
   "source": [
    "print(max(prices_and_names))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在一个字典上执行普通的数学运算，它们仅仅作用于键，而不是值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'AAPL'"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(prices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "想要在字典的值集合上执行这些计算。或许会尝试着使用字典的`values()`方法来解决这个问题："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10.75"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(prices.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可能还想要知道对应的键的信息(比如那种股票价格是最低的)。\n",
    "\n",
    "可以在`min()`和`max()`函数中提供`key`函数参数来获取最小值或最大值对应的键的信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'FB'"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(prices, key = lambda k: prices[k])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但是，如果还想要得到最小值，你又得执行一次查找操作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "min_value = prices[min(prices, key = lambda k: prices[k])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "前面的`zip()`函数方案通过将字典“反转”为(值，键)元组序列来解决了上述问题。当比较两个元组的时候，值会先进行比较，然后才是键。这样的话你就能通过一条简单的语句就能很轻松的实现在字典上的求最值和排序操作了。\n",
    "\n",
    "需要注意的是在计算操作中使用到了(值，键)对。当多个实体拥有相同的值的时候，键会决定返回结果。比如，在执行`min()`和`max()`操作的时候，如果恰巧最小或最大值有重复的，那么拥有最小或最大键的实体会返回："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(45.23, 'AAA')"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prices = {'AAA': 45.23, 'ZZZ': 45.23}\n",
    "min(zip(prices.values(), prices.keys()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(45.23, 'ZZZ')"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max(zip(prices.values(), prices.keys()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.9 查找两字典的相同点"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在两个字典中寻寻找相同点(比如相同的键、相同的值等等)。\n",
    "\n",
    "为了寻找两个字典的相同点，可以简单的在两字典的`keys()`或者`items()`方法返回结果上执行集合操作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = {\n",
    "    'x': 1,\n",
    "    'y': 2,\n",
    "    'z': 3\n",
    "}\n",
    "b = {\n",
    "    'w': 10,\n",
    "    'x': 11,\n",
    "    'y': 2\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x', 'y'}"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.keys() & b.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'z'}"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.keys() - b.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('y', 2)}"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.items() & b.items()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这些操作也可以用于修改或者过滤字典元素。比如，假如你想以现有字典构造一个排除几个指定键的新字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x': 1, 'y': 2}"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = {key: a[key] for key in a.keys() - {'z', 'w'}}\n",
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个字典就是一个键集合与值集合的映射关系。字典的`keys()`方法返回一个展现键集合的键视图对象。键视图的一个很少被了解的特性就是它们也支持集合操作，比如集合并、交、差运算。所以，如果想对集合的键执行一些普通的集合操作，可以直接使用键视图对象而不用先将它们转换成一个`set`。\n",
    "\n",
    "字典的`items()`方法返回一个包含(键，值)对的元素视图对象。这个对象同样也支持集合操作，并且可以被用来查找两个字典有哪些相同的键值对。\n",
    "\n",
    "尽管字典的`values()`方法也是类似，但是它并不支持这里介绍的集合操作。某种程度上是因为值视图不能保证所有的值互不相同，这样会导致某些集合操作会出现问题。不过，如果硬要在值上面执行这些集合操作的话，可以先将值集合转换成set，然后再执行集合运算就行了。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.10 删除序列相同元素并保持顺序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果序列上的值都是`hashable`类型，那么可以很简单的利用集合或者生成器来解决这个问题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 5, 2, 9, 10]"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def dedupe(items):\n",
    "    seen = set()\n",
    "    for item in items:\n",
    "        if item not in seen:\n",
    "            yield item\n",
    "            seen.add(item)\n",
    "            \n",
    "a = [1, 5, 2, 1, 9, 1, 5, 10]\n",
    "list(dedupe(a))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这个方法仅仅在序列中元素为`hashable`的时候才管用。如果想消除元素不可哈希(比如`dict`类型)的序列中重复元素的话，需要将上述代码稍微改变一下，`key`参数指定了一个函数，将序列元素转换成`hashable`类型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def dedupe(items, key = None):\n",
    "    seen = set()\n",
    "    for item in items:\n",
    "        val = item if key is None else key(item)\n",
    "        if val not in seen:\n",
    "            yield item\n",
    "            seen.add(val)\n",
    "            \n",
    "a = [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]\n",
    "list(dedupe(a, key = lambda d: (d['x'], d['y'])))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(dedupe(a, key = lambda d: d['x']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果想基于单个字段、属性或者某个更大的数据结构来消除重复元素，第二种方同样可以胜任。\n",
    "\n",
    "如果你仅仅就是想消除重复元素，通常可以简单的构造一个集合。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 5, 2, 1, 9, 1, 5, 10]"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = [1, 5, 2, 1, 9, 1, 5, 10]\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 5, 9, 10}"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然而，这种方法不能维护元素的顺序，生成的结果中的元素位置被打乱。而上面的方法可以避免这种情况。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.11 命名切片"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一段代码要从一个记录字符串中几个固定位置提取出特定的数据字段(比如文件或类似格式)："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "51325.0"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "record = '....................100 .......513.25 ..........'\n",
    "cost = int(record[20:23]) * float(record[31:37])\n",
    "cost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为避免大量无法理解的硬编码下标，使得你的代码更加清晰可读，应这样命名切片："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "51325.0"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SHARES = slice(20, 23)\n",
    "PRICE = slice(31, 37)\n",
    "cost = int(record[SHARES]) * float(record[PRICE])\n",
    "cost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "代码中如果出现大量的硬编码下标值会使得可读性和可维护性大大降低。这里的解决方案是一个很简单的方法让你更加清晰的表达代码到底要做什么。\n",
    "\n",
    "内置的slice()函数创建了一个切片对象，可以被用在任何切片允许使用的地方。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 3]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "items = [0, 1, 2, 3, 4, 5, 6]\n",
    "a = slice(2, 4)\n",
    "items[2:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 3]"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "items[a]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 10, 11, 4, 5, 6]"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "items[a] = [10, 11]\n",
    "items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 4, 5, 6]"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "del items[a]\n",
    "items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果有一个切片对象`s`，可以分别调用它的`s.start`,`s.stop`,`s.step`属性来获取更多的信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = slice(5, 50, 2)\n",
    "s.start"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.stop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.step"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你还能通过调用切片的`indices(size)`方法将它映射到一个确定大小的序列上，这个方法返回一个三元组`(start,stop,step)`，所有值都会被合适的缩小以满足边界限制，从而使用的时候避免出现`IndexError`异常。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 4, 1)"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = 'helloworld'\n",
    "a.indices(len(s))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "l\n",
      "l\n"
     ]
    }
   ],
   "source": [
    "for i in range(*a.indices(len(s))):\n",
    "    print(s[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.12 序列中出现次数最多的元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`collections.Counter`类就是专门为这类问题而设计的，它甚至有一个有用的`most_common()`方法直接给了你答案。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('eyes', 8), ('the', 5), ('look', 4)]\n"
     ]
    }
   ],
   "source": [
    "from collections import  Counter\n",
    "\n",
    "words = [\n",
    "    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes',\n",
    "    'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', \"don't\", 'look',\n",
    "    'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', \"you're\", 'under'\n",
    "]\n",
    "\n",
    "word_counts = Counter(words)\n",
    "top_three = word_counts.most_common(3)\n",
    "print(top_three)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Counter`对象可以接受任意的`hashable`序列对象。在底层实现上，一个`Counter`对象就是一个字典，将元素映射到它出现的次数上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_counts['eyes']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "手动增加计数，可以用加法实现："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "morewords = ['why', 'are', 'you', 'not', 'looking', 'in', 'my', 'eyes']\n",
    "for word in morewords:\n",
    "    word_counts[word] += 1\n",
    "\n",
    "word_counts['eyes']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "或者你可以使用`update()`方法："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [],
   "source": [
    "word_counts.update(morewords)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Counter`实例一个鲜为人知的特性是它们可以很容易的跟数学运算操作相结合。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'around': 2,\n",
       "         \"don't\": 1,\n",
       "         'eyes': 8,\n",
       "         'into': 3,\n",
       "         'look': 4,\n",
       "         'my': 3,\n",
       "         'not': 1,\n",
       "         'the': 5,\n",
       "         'under': 1,\n",
       "         \"you're\": 1})"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = Counter(words)\n",
    "b = Counter(morewords)\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'are': 1,\n",
       "         'eyes': 1,\n",
       "         'in': 1,\n",
       "         'looking': 1,\n",
       "         'my': 1,\n",
       "         'not': 1,\n",
       "         'why': 1,\n",
       "         'you': 1})"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'are': 1,\n",
       "         'around': 2,\n",
       "         \"don't\": 1,\n",
       "         'eyes': 9,\n",
       "         'in': 1,\n",
       "         'into': 3,\n",
       "         'look': 4,\n",
       "         'looking': 1,\n",
       "         'my': 4,\n",
       "         'not': 2,\n",
       "         'the': 5,\n",
       "         'under': 1,\n",
       "         'why': 1,\n",
       "         'you': 1,\n",
       "         \"you're\": 1})"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = a + b\n",
    "c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'around': 2,\n",
       "         \"don't\": 1,\n",
       "         'eyes': 7,\n",
       "         'into': 3,\n",
       "         'look': 4,\n",
       "         'my': 2,\n",
       "         'the': 5,\n",
       "         'under': 1,\n",
       "         \"you're\": 1})"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = a - b\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Counter`对象在几乎所有需要制表或者计数数据的场合是非常有用的工具。在解决这类问题的时候应该优先选择它，而不是手动的利用字典去实现。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.13 通过某个关键字排序一个字典列表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过使用`operator`模块的`itemgetter`函数，可以非常容易的排序这样的数据结构。根据任意的字典字段来排序输入结果行是很容易实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},\n",
       " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n",
       " {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
       " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from operator import itemgetter\n",
    "\n",
    "rows = [\n",
    "    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n",
    "    {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
    "    {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},\n",
    "    {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}\n",
    "]\n",
    "rows_by_fname = sorted(rows, key = itemgetter('fname'))\n",
    "rows_by_uid = sorted(rows, key = itemgetter('uid'))\n",
    "rows_by_fname"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},\n",
       " {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
       " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n",
       " {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rows_by_uid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`itemgetter()`函数也支持多个`keys`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
       " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},\n",
       " {'fname': 'Big', 'lname': 'Jones', 'uid': 1004},\n",
       " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rows_by_lfname = sorted(rows, key = itemgetter('lname', 'fname'))\n",
    "rows_by_lfname"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`rows`被传递给接受一个关键字参数的`sorted()`内置函数。这个参数是`callable`类型，并且从`rows`中接受一个单一元素，然后返回被用来排序的值。`itemgetter()`函数就是负责创建这个callable对象的。\n",
    "\n",
    "`operator.itemgetter()`函数有一个被rows中的记录用来查找值的索引参数。可以是一个字典键名称，一个整形值或者任何能够传入一个对象的`__getitem__()`方法的值。如果传入多个索引参数给`itemgetter()`，它生成的`callable`对象会返回一个包含所有元素值的元组，并且`sorted()`函数会根据这个元组中元素顺序去排序。但想要同时在几个字段上面进行排序(比如通过姓和名来排序，也就是例子中的那样)的时候这种方法是很有用的。\n",
    "\n",
    "`itemgetter()`有时候也可以用`lambda`表达式代替。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},\n",
       " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n",
       " {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
       " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rows_by_fname = sorted(rows, key = lambda r: r['fname'])\n",
    "rows_by_lfname = sorted(rows, key = lambda r: (r['lname'], r['fname']))\n",
    "rows_by_fname"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n",
       " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},\n",
       " {'fname': 'Big', 'lname': 'Jones', 'uid': 1004},\n",
       " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rows_by_lfname"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用`itemgetter()`方式会运行的稍微快点。如果对性能要求比较高的话就使用`itemgetter()`方式。\n",
    "\n",
    "最后，不要忘了这些也同样适用于`min()`和`max()`等函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(rows, key = itemgetter('uid'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max(rows, key = itemgetter('uid'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.14 排序不支持原生比较的对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "内置的`sorted()`函数有一个关键字参数`key`，可以传入一个`callable`对象给它，这个`callable`对象对每个传入的对象返回一个值，这个值会被`sorted`用来排序这些对象。\n",
    "\n",
    "比如，如果在应用程序里面有一个`User`实例序列，并且希望通过他们的`user_id`属性进行排序，可以提供一个以`User`实例作为输入并输出对应`user_id`值的`callable`对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "class User:\n",
    "    \n",
    "    def __init__(self, user_id):\n",
    "        self.user_id = user_id\n",
    "        \n",
    "    def __repr__(self):\n",
    "        return 'User({})'.format(self.user_id)\n",
    "    \n",
    "def sort_notcompare():\n",
    "    users = [User(23), User(3), User(99)]\n",
    "    print(users)\n",
    "    print(sorted(users, key = lambda u: u.user_id))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外一种方式是使用`operator.attrgetter()`来代替`lambda`函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[User(3), User(23), User(99)]"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from operator import attrgetter\n",
    "\n",
    "users = [User(23), User(3), User(99)]\n",
    "sorted(users, key = attrgetter('user_id'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "选择使用`lambda`函数或者是`attrgetter()`可能取决于个人喜好。但是，`attrgetter()`函数通常会运行的快点，并且还能同时允许多个字段进行比较。这个跟`operator.itemgetter()`函数作用于字典类型很类似。\n",
    "\n",
    "例如，如果`User`实例还有一个`first_name`和`last_name`属性，那么可以向下面这样排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "by_name = sorted(users, key = attrgetter('last_name', 'first_name'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，不要忘了这些也同样适用于`min()`和`max()`等函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "User(3)"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "min(users, key = attrgetter('user_id'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "User(99)"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max(users, key = attrgetter('user_id'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.15 通过某个字段将记录分组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一个字典或者实例的序列，然后想根据某个特定的字段比如`date`来分组迭代访问。`itertools.groupby()`函数对于这样的数据分组操作非常实用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "07/01/2012\n",
      "  {'address': '5412 N CLARK', 'date': '07/01/2012'}\n",
      "  {'address': '4801 N BROADWAY', 'date': '07/01/2012'}\n",
      "07/02/2012\n",
      "  {'address': '5800 E 58TH', 'date': '07/02/2012'}\n",
      "  {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}\n",
      "  {'address': '1060 W ADDISON', 'date': '07/02/2012'}\n",
      "07/03/2012\n",
      "  {'address': '2122 N CLARK', 'date': '07/03/2012'}\n",
      "07/04/2012\n",
      "  {'address': '5148 N CLARK', 'date': '07/04/2012'}\n",
      "  {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}\n"
     ]
    }
   ],
   "source": [
    "from operator import itemgetter\n",
    "from itertools import groupby\n",
    "\n",
    "rows = [\n",
    "    {'address': '5412 N CLARK', 'date': '07/01/2012'},\n",
    "    {'address': '5148 N CLARK', 'date': '07/04/2012'},\n",
    "    {'address': '5800 E 58TH', 'date': '07/02/2012'},\n",
    "    {'address': '2122 N CLARK', 'date': '07/03/2012'},\n",
    "    {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},\n",
    "    {'address': '1060 W ADDISON', 'date': '07/02/2012'},\n",
    "    {'address': '4801 N BROADWAY', 'date': '07/01/2012'},\n",
    "    {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},\n",
    "]\n",
    "\n",
    "rows.sort(key = itemgetter('date'))\n",
    "for date, items in groupby(rows, key = itemgetter('date')):\n",
    "    print(date)\n",
    "    for i in items:\n",
    "        print(' ', i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`groupby()`函数扫描整个序列并且查找连续相同值(或者根据指定`key`函数返回值相同)的元素序列。在每次迭代的时候，它会返回一个值和一个迭代器对象，这个迭代器对象可以生成元素值全部等于上面那个值的组中所有对象。\n",
    "\n",
    "一个非常重要的准备步骤是要根据指定的字段将数据排序。因为`groupby()`仅仅检查连续的元素，如果事先并没有排序完成的话，分组函数将得不到想要的结果。\n",
    "\n",
    "如果仅仅只是想根据`date`字段将数据分组到一个大的数据结构中去，并且允许随机访问，那么最好使用`defaultdict()`来构建一个多值字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'address': '5412 N CLARK', 'date': '07/01/2012'}\n",
      "{'address': '4801 N BROADWAY', 'date': '07/01/2012'}\n"
     ]
    }
   ],
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "rows_by_date = defaultdict(list)\n",
    "for row in rows:\n",
    "    rows_by_date[row['date']].append(row)\n",
    "    \n",
    "for r in rows_by_date['07/01/2012']:\n",
    "    print(r)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面这个例子中，没必要先将记录排序。因此，如果对内存占用不是很关心，这种方式会比先排序然后再通过`groupby()`函数迭代的方式运行得快一些"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.16 过滤序列元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一个数据序列，想利用一些规则从中提取出需要的值或者是缩短序列，最简单的过滤序列元素的方法就是使用列表推导。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 4, 10, 2, 3]"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mylist = [1, 4, -5, 10, -7, 2, 3, -1]\n",
    "[n for n in mylist if n > 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[-5, -7, -1]"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[n for n in mylist if n < 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用列表推导的一个潜在缺陷就是如果输入非常大的时候会产生一个非常大的结果集，占用大量内存。如果对内存比较敏感，那么可以使用生成器表达式迭代产生过滤的元素。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<generator object <genexpr> at 0x00C6EA80>"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pos = (n for n in mylist if n > 0)\n",
    "pos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "4\n",
      "10\n",
      "2\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "for x in pos:\n",
    "    print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有时候，过滤规则比较复杂，不能简单的在列表推导或者生成器表达式中表达出来。比如，假设过滤的时候需要处理一些异常或者其他复杂情况。这时候你可以将过滤代码放到一个函数中，然后使用内建的`filter()`函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['1', '2', '-3', '4', '5']\n"
     ]
    }
   ],
   "source": [
    "values = ['1', '2', '-3', '-', '4', 'N/A', '5']\n",
    "def is_int(val):\n",
    "    try:\n",
    "        x = int(val)\n",
    "        return True\n",
    "    except ValueError:\n",
    "        return False\n",
    "    \n",
    "ivals = list(filter(is_int, values))\n",
    "print(ivals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`filter()`函数创建了一个迭代器，因此如果想得到一个列表的话，就得像示例那样使用`list()`去转换。\n",
    "\n",
    "列表推导和生成器表达式通常情况下是过滤数据最简单的方式。其实它们还能在过滤的时候转换数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1.0, 2.0, 3.1622776601683795, 1.4142135623730951, 1.7320508075688772]"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mylist = [1, 4, -5, 10, -7, 2, 3, -1]\n",
    "import math\n",
    "[math.sqrt(n) for n in mylist if n > 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "过滤操作的一个变种就是将不符合条件的值用新的值代替，而不是丢弃它们。比如，在一列数据中可能不仅想找到正数，而且还想将不是正数的数替换成指定的数。通过将过滤条件放到条件表达式中去，可以很容易的解决这个问题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 4, 0, 10, 0, 2, 3, 0]"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clip_neg = [n if n > 0 else 0 for n in mylist]\n",
    "clip_neg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外一个值得关注的过滤工具就是`itertools.compress()`，它以一个`iterable`对象和一个相对应的`Boolean`选择器序列作为输入参数。然后输出`iterable`对象中对应选择器为`True`的元素。当需要用另外一个相关联的序列来过滤某个序列的时候，这个函数是非常有用的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[False, False, True, False, False, True, True, False]"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "addresses = [\n",
    "    '5412 N CLARK',\n",
    "    '5148 N CLARK',\n",
    "    '5800 E 58TH',\n",
    "    '2122 N CLARK',\n",
    "    '5645 N RAVENSWOOD',\n",
    "    '1060 W ADDISON',\n",
    "    '4801 N BROADWAY',\n",
    "    '1039 W GRANVILLE',\n",
    "]\n",
    "counts = [0, 3, 10, 4, 1, 7, 6, 1]\n",
    "\n",
    "from itertools import compress\n",
    "more5 = [n > 5 for n in counts]\n",
    "more5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['5800 E 58TH', '1060 W ADDISON', '4801 N BROADWAY']"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(compress(addresses, more5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里的关键点在于先创建一个`Boolean`序列，指示哪些元素复合条件。然后`compress()`函数根据这个序列去选择输出对应位置为`True`的元素。\n",
    "\n",
    "和`filter()`函数类似，`compress()`也是返回的一个迭代器。因此，如果需要得到一个列表，那么需要使用`list()`来将结果转换为列表类型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.17 从字典中提取子集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "构造一个字典，它是另外一个字典的子集。最简单的方式是使用字典推导。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [],
   "source": [
    "prices = {\n",
    "    'ACME': 45.23,\n",
    "    'AAPL': 612.78,\n",
    "    'IBM': 205.55,\n",
    "    'HPQ': 37.20,\n",
    "    'FB': 10.75\n",
    "}\n",
    "p1 = {key: value for key, value in prices.items() if value > 200}\n",
    "tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}\n",
    "p2 = {key: value for key, value in prices.items() if key in tech_names}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "大多数情况下字典推导能做到的，通过创建一个元组序列然后把它传给`dict()`函数也能实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [],
   "source": [
    "p1 = dict({key: value for key, value in prices.items() if value > 200})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "字典推导方式表意更清晰，并且实际上也会运行的更快些(在这个例子中，实际测试几乎比`dcit()`函数方式快整整一倍)。\n",
    "\n",
    "有时候完成同一件事会有多种方式。第二个例子程序也可以像这样重写："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [],
   "source": [
    "tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}\n",
    "p2 = {key: prices[key] for key in prices.keys() & tech_names}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但是运行时间测试结果显示这种方案大概比第一种方案慢1.6倍。如果对程序运行性能要求比较高的话，需要花点时间去做计时测试。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.18 映射名称到序列元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一段通过下标访问列表或者元组中元素的代码，但是这样有时候会使得代码难以阅读，于是想通过名称来访问元素。\n",
    "\n",
    "`collections.namedtuple()`函数通过使用一个普通的元组对象来解决这个问题。这个函数实际上是一个返回Python中标准元组类型子类的一个工厂方法。需要传递一个类型名和需要的字段给它，然后它就会返回一个类，可以初始化这个类，为定义的字段传递值等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Subscriber(addr='joney@example.com', joined='2012-10-19')"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import namedtuple\n",
    "subscriber = namedtuple('Subscriber', ['addr', 'joined'])\n",
    "sub = subscriber('joney@example.com', '2012-10-19')\n",
    "sub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'joney@example.com'"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sub.addr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2012-10-19'"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sub.joined"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "尽管`namedtuple`的实例看起来像一个普通的类实例，但是它跟元组类型是可交换的，支持所有的普通元组操作，比如索引和解压。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(sub)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'joney@example.com'"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "addr, joined = sub\n",
    "addr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2012-10-19'"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "命名元组的一个主要用途是将代码从下标操作中解脱出来。因此，如果从数据库调用中返回了一个很大的元组列表，通过下标去操作其中的元素，当在表中添加了新的列的时候代码可能就会出错了。但是如果使用了命名元组，那么就不会有这样的顾虑。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_cost(records):\n",
    "    total = 0.0\n",
    "    for rec in records:\n",
    "        total += rec[1] * rec[2]\n",
    "    return total"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下标操作通常会让代码表意不清晰，并且非常依赖记录的结构。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import namedtuple\n",
    "\n",
    "stock = namedtuple('Stock', ['name', 'shared', 'price'])\n",
    "def compute_cost(records):\n",
    "    total = 0.0\n",
    "    for rec in records:\n",
    "        s = stock(*rec)\n",
    "    total += s.shared * s.price\n",
    "    return total"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "命名元组另一个用途就是作为字典的替代，因为字典存储需要更多的内存空间。如果需要构建一个非常大的包含字典的数据结构，那么使用命名元组会更加高效。但是需要注意的是，不像字典那样，一个命名元组是不可更改的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Stock(name='acem', shared=100, price=123.45)"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = stock('acem', 100, 123.45)\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "can't set attribute",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-140-1d6a371afdd9>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0ms\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshared\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m75\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m: can't set attribute"
     ]
    }
   ],
   "source": [
    "s.shared = 75"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果真的需要改变然后的属性，那么可以使用命名元组实例的`_replace()`方法，它会创建一个全新的命名元组并将对应的字段用新的值取代。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Stock(name='acem', shared=75, price=123.45)"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = s._replace(shared = 75)\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`_replace()`方法还有一个很有用的特性就是当命名元组拥有可选或者缺失字段时候，它是一个非常方便的填充数据的方法。可以先创建一个包含缺省值的原型元组，然后使用`_replace()`方法创建新的值被更新过的实例。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "stock(name='acme', shares=100, price=123.45, date=None, time=None)"
      ]
     },
     "execution_count": 142,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import namedtuple\n",
    "\n",
    "stock = namedtuple('stock', ['name', 'shares', 'price', 'date', 'time'])\n",
    "stock_prototype = stock('', 0, 0.0, None, None)\n",
    "\n",
    "def dict_to_stock(s):\n",
    "    return stock_prototype._replace(**s)\n",
    "\n",
    "a = {'name': 'acme', 'shares': 100, 'price': 123.45}\n",
    "dict_to_stock(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "stock(name='acme', shares=100, price=123.45, date='12/17/2012', time=None)"
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = {'name': 'acme', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}\n",
    "dict_to_stock(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果目标是定义一个需要更新很多实例属性的高效数据结构，那么命名元组并不是最佳选择。这时候应该考虑定义一个包含`__slots__`方法的类。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.19 转换并同时计算数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要在数据序列上执行聚集函数(比如`sum()`,`min()`,`max()`)，但是首先需要先转换或者过滤数据。\n",
    "\n",
    "一个非常优雅的方式去结合数据计算与转换就是使用一个生成器表达式参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "55"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nums = [1, 2, 3, 4, 5]\n",
    "s = sum(x * x for x in nums)\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用一个生成器表达式作为参数会比先创建一个临时列表更加高效和优雅。比如，如果不使用生成器表达式的话，可能会考虑使用下面的实现方式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = sum([x * x for x in nums])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这种方式同样可以达到想要的效果，但是它会多一个步骤，先创建一个额外的列表。对于小型列表可能没什么关系，但是如果元素数量非常大的时候，它会创建一个巨大的仅仅被使用一次就被丢弃的临时数据结构。而生成器方案会以迭代的方式转换数据，因此更省内存。\n",
    "\n",
    "在使用一些聚集函数比如min()和max()的时候可能更加倾向于使用生成器版本，它们接受的一个key关键字参数或许对你很有帮助。比如，在上面的证券例子中，可能会考虑下面的实现版本："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [],
   "source": [
    "min_shares = min(s['shares'] for s in portfolio)\n",
    "min_shares = min(portfolio, key = lambda s: s['shares'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.20 合并多个字典或映射"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在有多个字典或者映射，想将它们从逻辑上合并为一个单一的映射后执行某些操作，比如查找值或者检查某些键是否存在。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = {'x': 1, 'z': 3}\n",
    "b = {'y': 2, 'z': 4}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在假设必须在两个字典中执行查找操作(比如先从`a`中找，如果找不到再在`b`中找)。一个非常简单扼解决方案就是使用`collections`模块中的`ChainMap`类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "2\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "from collections import ChainMap\n",
    "\n",
    "c = ChainMap(a, b)\n",
    "print(c['x'])\n",
    "print(c['y'])\n",
    "print(c['z'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个`ChainMap`接受多个字典并将它们在逻辑上变为一个字典。然后，这些字典并不是真的合并在一起了，`ChainMap`类只是在内部创建了一个容纳这些字典的列表并重新定义了一些常见的字典操作来遍历这个列表。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['x', 'z', 'y']"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(c.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 3, 2]"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(c.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果出现重复键，那么第一次出现的映射值会被返回。因此，例子程序中的`c['z']`总是会返回字典`a`中对应的值，而不是`b`中对应的值。\n",
    "\n",
    "对于字典的更新或删除操作总是影响的是列表中第一个字典。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'w': 40, 'z': 10}"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c['z'] = 10\n",
    "c['w'] = 40\n",
    "del c['x']\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`ChainMap`对于编程语言中的作用范围变量(比如`globals`,`locals`等)是非常有用的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ChainMap({'x': 3}, {'x': 2}, {'x': 1})"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "values = ChainMap()\n",
    "values['x'] = 1\n",
    "values = values.new_child()\n",
    "values['x'] = 2\n",
    "values = values.new_child()\n",
    "values['x'] = 3\n",
    "values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "values['x']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "values = values.parents\n",
    "values['x']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "values = values.parents\n",
    "values['x']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ChainMap({'x': 1})"
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "作为`ChainMap`的替代，你可能会考虑使用`update()`方法将两个字典合并。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = {'x': 1, 'z': 3}\n",
    "b = {'y': 2, 'z': 4}\n",
    "merged = dict(b)\n",
    "merged.update(a)\n",
    "merged['x']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "merged['y']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "merged['z']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这样也能行得通，但是它需要你创建一个完全不同的字典对象(或者是破坏现有字典结构)。同时，如果原字典做了更新，这种改变不会反应到新的合并字典中去。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a['x'] = 13\n",
    "merged['x']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`ChianMap`使用原来的字典，它自己不创建新的字典。所以它并不会产生上面所说的结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = {'x': 1, 'z': 3}\n",
    "b = {'y': 2, 'z': 4}\n",
    "merged = ChainMap(a, b)\n",
    "merged['x']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "42"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a['x'] = 42\n",
    "merged['x']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "EOF"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
